Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU

نویسندگان

  • Akihiko Kasagi
  • Koji Nakano
  • Yasuaki Ito
چکیده

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. The bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array a into array b along a permutation given in advance. The main contribution of this paper is to implement a conflict-free permutation algorithm on the DMM in a GPU. We have also implemented straightforward permutation algorithms on the GPU. The experimental results for 1024 double (64-bit) numbers on NVIDIA GeForce GTX-680 show that the straightforward permutation algorithm takes 247.8 ns for the random permutation and 1684ns for the worst permutation that involves the maximum bank conflicts. Our conflict-free permutation algorithm runs in 167ns for any permutation including the random permutation and the worst permutation, although it performs more memory accesses. It follows that our conflict-free permutation is 1.48 times faster for the random permutation and 10.0 times faster for the worst permutation. key words: memory machine models, data movement, bank conflict, shared memory, GPU, CUDA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chaotic Genetic Algorithm based on Explicit Memory with a new Strategy for Updating and Retrieval of Memory in Dynamic Environments

Many of the problems considered in optimization and learning assume that solutions exist in a dynamic. Hence, algorithms are required that dynamically adapt with the problem’s conditions and search new conditions. Mostly, utilization of information from the past allows to quickly adapting changes after. This is the idea underlining the use of memory in this field, what involves key design issue...

متن کامل

Parallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform

There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

Real-time Scheduling of a Flexible Manufacturing System using a Two-phase Machine Learning Algorithm

The static and analytic scheduling approach is very difficult to follow and is not always applicable in real-time. Most of the scheduling algorithms are designed to be established in offline environment. However, we are challenged with three characteristics in real cases: First, problem data of jobs are not known in advance. Second, most of the shop’s parameters tend to be stochastic. Third, th...

متن کامل

A Novel Intrusion Detection Systems based on Genetic Algorithms-suggested Features by the Means of Different Permutations of Labels’ Orders

Intrusion detection systems (IDS) by exploiting Machine learning techniques are able to diagnose attack traffics behaviors. Because of relatively large numbers of features in IDS standard benchmark dataset, like KDD CUP 99 and NSL_KDD, features selection methods play an important role. Optimization algorithms like Genetic algorithms (GA) are capable of finding near-optimum combination of the fe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 96-D  شماره 

صفحات  -

تاریخ انتشار 2013